44 research outputs found

    Using phonetic constraints in acoustic-to-articulatory inversion

    Get PDF
    The goal of this work is to recover articulatory information from the speech signal by acoustic-to-articulatory inversion. One of the main difficulties with inversion is that the problem is underdetermined and inversion methods generally offer no guarantee on the phonetical realism of the inverse solutions. A way to adress this issue is to use additional phonetic constraints. Knowledge of the phonetic caracteristics of French vowels enable the derivation of reasonable articulatory domains in the space of Maeda parameters: given the formants frequencies (F1,F2,F3) of a speech sample, and thus the vowel identity, an "ideal" articulatory domain can be derived. The space of formants frequencies is partitioned into vowels, using either speaker-specific data or generic information on formants. Then, to each articulatory vector can be associated a phonetic score varying with the distance to the "ideal domain" associated with the corresponding vowel. Inversion experiments were conducted on isolated vowels and vowel-to-vowel transitions. Articulatory parameters were compared with those obtained without using these constraints and those measured from X-ray data

    Compact representations of the articulatory-to-acoustic mapping

    Get PDF
    International audienceArticulatory codebooks are very often used to represent the articulatory-to-acoustic mapping. They thus need to be compact while offering a very good acoustic precision. This paper presents a method of articulatory codebook construction more general than that of Ouni in the sense that the articulatory-to-acoustic mapping is approximated by multivariable polynomials. The second major contribution concerns the subdivision process which finds out the most efficient subdivision, i.e. that which minimizes the size of the codebook while guarantying a very good acoustic precision. Experiments carried out show that the size of the codebook can be divided by a factor of 20, and simultaneously, the acoustic precision can improved by a factor of 2 by using second order polynomials together with this new construction strategy

    Improving the Sampling of the Null Space of the Acoustic-to-Articulatory Mapping

    Get PDF
    International audienceThis paper presents a new method for sampling the null space of the acoustic-to-articulatory mapping, which is considerably faster and more accurate than the previous method presented by Ouni and Laprie. This is achieved by using a simple stochastic exploration of the articulatory space instead of complex linear programming techniques. This new method allows for a much faster and more accurate inversion process

    Adjonction de contraintes visuelles pour l'inversion acoustique-articulatoire

    Get PDF
    The goal of this work is to investigate audiovisual-to-articulatory inversion. It is well established that acoustic-to-articulatory inversion is an under-determined problem. On the other hand, there is strong evidence that human speakers/listeners exploit the multimodality of speech, and more particularly the articulatory cues : the view of visible articulators, i.e. jaw and lips, improves speech intelligibility. It is thus interesting to add constraints provided by the direct visual observation of the speaker's face. Visible data were obtained by stereo-vision and enable the 3D recovery of jaw and lip movements. These data were processed to fit the nature of parameters of Maeda's articulatory model. Inversion experiments show that constraints on visible articulatory parameters enable relevant articulatory trajectories to be recovered and substantially reduce time required to explore the articulatory codebook

    Inversion acoustique-articulatoire en utilisant des contraintes phonétiques

    Get PDF
    Le but de l'inversion acoustique articulatoire est d'obtenir la position des articulateurs à partir du signal de parole. L'une des difficultés majeures de l'inversion est qu'une infinité de formes de conduits peut donner un même spectre de parole. Une façon de réduire cette difficulté est de contraindre davantage le problème, en utilisant par exemple des contraintes d'ordre visuel (on suppose connaître en plus du signal de parole, la position des articulateurs visibles ; ce qui peut se faire en utilisant une ou plusieurs caméras), ou d'ordre phonétique (les caractéristiques phonétiques des voyelles du Français sont connues, par exemple). Mais cette difficulté peut aussi se transformer en avantage : en permettant d'obtenir toutes les configurations du conduit vocal correspondant à un son donné, l'inversion fournit potentiellement un moyen d'étudier des stratégies compensatoires préservant l'acoustique. Nous montrerons comment l'utilisation de contraintes d'origine phonétique permet de réduire considérablement l'espace des solutions et d'améliorer la pertinence des solutions

    Inversion acoustique-articulatoire en utilisant des contraintes phonétiques

    Get PDF
    Textes issus des Journées fédératrices "Perturbations et réajustements, langue et langage" organisées à Haguenau en décembre 2004 par le Réseau des sciences cognitives du Grand Est, Cogniest, l'E.A. 1399 Linguistique, langues et paroles -LilPa...Le but de l'inversion acoustique articulatoire est d'obtenir la position des articulateurs à partir du signal de parole. L'une des difficultés majeures de l'inversion est qu'une infinité de formes de conduits peut donner un même spectre de parole. Une façon de réduire cette difficulté est de contraindre davantage le problème, en utilisant par exemple des contraintes d'ordre visuel (on suppose connaître en plus du signal de parole, la position des articulateurs visibles ; ce qui peut se faire en utilisant une ou plusieurs caméras), ou d'ordre phonétique (les caractéristiques phonétiques des voyelles du Français sont connues, par exemple). Mais cette difficulté peut aussi se transformer en avantage : en permettant d'obtenir toutes les configurations du conduit vocal correspondant à un son donné, l'inversion fournit potentiellement un moyen d'étudier des stratégies compensatoires préservant l'acoustique. Nous montrerons comment l'utilisation de contraintes d'origine phonétique permet de réduire considérablement l'espace des solutions et d'améliorer la pertinence des solutions

    Adapting visual data to a linear articulatory model

    Get PDF
    The goal of this work is to investigate audiovisual-to-articulatory inversion. It is well established that acoustic-to-articulatory inversion is an underdetermined problem. On the other hand, there is strong evidence that human speakers/listeners exploit the multimodality of speech, and more particularly the articulatory cues: the view of visible articulators, i.e. jaw and lips, improves speech intelligibility. It is thus interesting to add constraints provided by the direct visual observation of the speaker's face. Visible data was obtained by stereo-vision and enable the 3D recovery of jaw and lip movements. These data were processed to fit the nature of parameters of Maeda's articulatory model. Inversion experiments were conducted

    Incorporation of phonetic constraints in acoustic-to-articulatory inversion

    Get PDF
    International audienceThis study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model

    Integration of Real-Time Speech Processing Technologies for Online Gaming

    Get PDF
    This work demonstrates an application of different real-time speech technologies, exploited in an online gaming scenario. The game developed for this purpose is inspired by the famous television based quiz-game show, “Who wants to be a millionaire”, in which multiple-choice questions of increasing difficulty are asked to the participant. Text-to-speech synthesis is used to read out the questions and the possible answers to the user, while an automatic speech recognition engine is exploited to get input from the player, in order to proceed through the game. The speech data is recorded from the user with the help of a real-time voice activity detector to select speech segments from the input audio data. The developed Java application allows an automatic insertion of new multiple-choice questions, of different complexity, which could then be selected during the game
    corecore